.TH ANALYSESEQS 1
.SH NAME
AnalyseSeqs \- Analyse a set of sequences of common length 
.SH SYNOPSIS
\fBAnalyseSeqs [\-X[\fIbswn\fP]] [\-Q] [\-M{mask}[+|!]] [\-D{H|A|G}] [\-d{S|H|D|B}]
.SH DESCRIPTION
.I AnalyseSeqs
reads a set of sequences from stdin and tries a variety of methods
for sequence analysis on them. Currently available are:
.br 
Statistical geometry for quadruples of sequences; THIS IS 
PRELIMINARY AND NOT WELL TESTED BY NOW.
.br
split decomposition;
neighbour joining and Ward's variance method for reconstructing
phylogenies using various distance measures. 
For statistical geometry and the cluster methods PostScript output
is available.
.br
The program continues reading until it encounters one of the
separator characters '@' or '%'. Only sequences of alphabetical
characters or of a specified alphabet are processed, all other
lines are ignored. The program stops reading
if it either encounters an EOF condition, or if there are no
valid sequence data between two lines beginning with separator
characters. 
.br
A list of taxa names can be specified in the input stream. The list 
begins with a line beginning with '*'. Optionally, a file name prefix
[fn] for the PostScript output can be specified in this line.
The entries have the form 'x : Taxon',
where x is the number of taxon, i.e., of the corresponding entry in 
the list of input sequences. The taxa list need not be complete. It must
end, however, with a line beginning with '*' or any of the separator
characters. The taxa list is printed on top of the output. The specified
taxa names are used as labels in the PostScript output. 

.SH OPTIONS
.IP \fB\-X[bswn]\fI\fP
specifies the analysis methods to be used. 
.IP \fB[b]\fI\fP 
Statistical Geometry. A PostScript file named '[fn_]box.ps' giving a 
graphical representation of the statistical geometry is created. The
resulting box is a good measure of 'tree likeness' of the data set.
This is the default.
.IP \fB[s]\fI\fP 
Split decomposition. 
.IP \fB[w]\fI\fP 
Cluster analysis using Ward's method. A PostScript file named '[fn_]wards.ps' 
is created containing a drawing of the tree. 
.IP \fB[n]\fI\fP
Cluster analysis using Saitou's neighbour joining method. A PostScript 
file named '[fn_]nj.ps' is created containing a drawing of the tree. 

.IP \fB\-Q\fB
indicates that a statistical geometry analysis is to be performed
comparing four data sets, for instance to confirm the significance of
a proposed phylogeny. This option is only useful for statistical
geometry analysis and hence the -X option is ignored. Each of the 
four data sets must be of the form
.br
* [filename_prefix]
.br
# number
.br
[list of taxa names]
.br
*
.br
list of sequences 
.br
%
.br
where number is 1,2,3,4 for the four groups to be compared.

.IP \fB\-M{mask}[+|!]\fB
allows one to specify a mask for the input file. '{mask}' can be one 
of the following letters indicating a predefined alphabet or 
the %-sign followed by all characters to be accepted. A + sign
at the very end of the mask indicates that the input is to be 
handled case sensitive. Default is conversion of the input to
upper case. A ! sign can be used to convert the input data to 
RY code: GgAaXx -> R, UuCcKkTt -> Y, all other letters are 
converted to *.  
.IP \fB-Ma\fI\fP 
all letters A-Z and a-z.
.IP \fB-Mu\fI\fP 
uppercase letters.
.IP \fB-Ml\fI\fP
lowercase letters.
.IP \fB-Mc\fI\fP
digits [0-9].
.IP \fB-Mn\fI\fP
all alphanumeric characters.
.IP \fB-MR\fI\fP
RNA alphabet (GCAUgcau).
.IP \fB-MD\fI\fP
DNA alphabet (GCATgcat).
.IP \fB-MA\fI\fP
Amino acids in one-letter code.
.IP \fB-MS\fI\fP
Secondary strcutures coded as '^.()'
.IP \fB-M%alphabet\fI\fP
use the specified alphabet.

.IP \fB\-D\fB
specifies the algorithm to be used for calculating the 
distance matrix of the input data set. Available are
.IP \fB-DH\fI\fP 
Hamming Distance
.IP \fB-DA[,cost]\fI\fB
Simple alignment distance according to Needleman and Wunsch.
A gap cost different from 1. can be specified after the comma.
.IP \fB-DG[,cost1,cost2]\fI\fB
Gotoh's distance with gap cost function 
g(k) = cost2+cost1*(k-1). cost2<=cost1 has to be fulfilled.
Default values are cost1=1., cost2=1., yielding the same
distance as option A. 
.br
ONLY THE HAMMING DISTANCE IS WELL TESTED BY NOW !!!

.IP \fB\-d\fB
specifies the edit cost matrix to be used. Available are
.IP \fB-dS\fI\fP 
simple distance. Indel and substitution of different characters
all have cost 1. The indel cost can be set by specifying the 
gap costs with the algorithm options -DA and -DG. This is the 
default. 
.IP \fB-dH\fI\fP 
A distance matrix for RNA secondary structures. Inspired by 
Hogeweg's similarity measure (J.Mol.Biol 1988).
Gap-function is set automatically. 
.IP \fB-dD\fI\fP 
Dayhoff's matrix for amino acid distances. 
.IP \fB-dB\fI\fP
Distinguish purines and pyrimidines only. 
CAUTION this option of course influences only the calculation of distances.
It does NOT affect computation of the statistical geometry. This is
done directly on the sequences. If you want to do statistical geometry on
RY sequences use the ! sign with the -M option, for instance -MR!.

.SH REFERENCES
The method of statistical geometry has been introduced by 
M. Eigen, R. Winkler-Oswatitsch and A.W.M. Dress 
(Proc Natl Acad Sci, 85:1988,5912).
The method of split decomposition was proposed by 
H.J. Bandelt and A.W.M. Dress
(Adv Math, 92:1992,47).
The variance method for cluster analysis is due to H.J. Ward 
(J Amer Stat Ass, 58:1963,236).
The neighbour joining method was published by Saitou and Nei
(Mol Biol Evol, 4:1987,406). 

This program is part of the Vienna RNA Package

.SH WARNING 
This is the beta test version. Some options or combinations
of options may still produce nonsense. Please send bug reports to 
ivo@tbi.univie.ac.at.

.SH VERSION
This man page is part of the Vienna RNA Package version 1.2.
.SH AUTHOR
Peter F Stadler, Ivo L. Hofacker.
.SH BUGS
Comments should be sent to ivo@itc.univie.ac.at.
.br
